-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement modules #297
Conversation
00ec1a0
to
517b684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review. I'm not quite done going through the AST yet and haven't looked at tests/lexer/grammar changes, etc, so some of these comments/questions might be handled elsewhere, in which case just clarifying the comments a bit would help.
One question - so far it looks to me like we may have lost the functionality for applying selectors to constraints?
@grjte Support for constraint selectors is still there, just wrapped up in the |
I may be missing some context here - but I think the syntax is quite different:
Am I missing something? |
Hm ok - I did see that it was in there. I'll look at this more to get my head around it as I finish the full AST & tests. I think I have a similar confusion to @bobbinth, that it doesn't feel natural to me to represent it as a list comprehension, but that may change once I've gotten through everything |
@bobbinth It's important to note that the |
Consider a portion of the current grammar (i.e. before this PR):
The representation of these two things is virtually identical, but the body of the comprehension is defined in the |
Yeah, I'm sure it will be clearer once I get to the new grammar. I guess we essentially treat
as
|
517b684
to
5c5985d
Compare
@grjte For simple constraints there is no need to wrap them in a comprehension, so
The latter is parsed into comprehension form, as if it had been written as The result is that we end up with two types of constraints to deal with: simple, and comprehension. When |
ac5fc13
to
d64f8c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another partial review. It covers almost everything except the parser test files and the semantic analysis, which I'll finish tomorrow. I think the new design makes sense and the changes look great overall. Most of my comments are fairly small ones regarding clarity. At this point I would say I don't anticipate that I'll have any major change requests for this PR, so small issues aside it should be pretty much ready to merge once I get through the rest of it
/// The name of this trace segment, e.g. `$main` | ||
/// | ||
/// NOTE: The name of a trace segment is always a special identifier (i.e. has the `$` prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for the future: I think we should get rid of the $
for the trace segment names, since they're not actually globally accessible. Personally, I also think we should let people name these segments whatever they want, and it will only be the order that determines the trace segment. main
and aux
are very Winterfell-/Miden-specific names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree that main
and aux
should not be special in any way, but I think there is probably value in making the top-level trace segment names global/special identifiers, i.e. if you name a segment foo
, you should be able to reference a specific column of foo directly as $foo[N]
. That said, I suspect in practice such a feature wouldn't be used much if at all, so perhaps it is only useful when performing certain transformations on the AST internally in the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my thinking was that a named trace segment foo
is only accessible within the root module, whereas a named random values vector rand
is actually globally accessible. Using foo[i]
and $rand[i]
distinguishes those things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear about what I meant - your example is how things work with regards to module-local bindings, i.e. within the root module, references to bindings in the trace_columns
declaration use the $
-less identifier syntax, e.g. foo[i]
, as would referring to specific random_values
bindings; but what I am suggesting is that within any module, you could always refer directly to the primary declaration (e.g. trace segment name) for both random_values
and trace_columns
segments using the $
-prefixed form. These names are guaranteed to be unique, as we don't allow conflicts in top-level identifiers, and the $
prefix guarantees they are distinct from other identifiers within any given module, so disambiguating them from the compiler-side is trivial (this in fact happens automatically when constructing the map of identifiers and their "binding type" when starting semantic analysis). From the perspective of someone writing AirScript it is perhaps less clear, but it seems to me to be unlikely that there would be confusion about the name associated with random_values
and the trace_columns
segment names from a readability point of view - but that's naturally debatable.
I guess bottom line is that it is easy to support with how things work currently, and is useful for certain transformations on the AST. That said, we don't have to support it when actually writing AirScript, i.e. we disallow it when validating modules, but then allow it internally in the compiler. They are useful in that context because of how identifiers are scoped, it is desirable to have an escape hatch in the form of global/fully-qualified identifiers.
d0d4416
to
eac0f51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another partial review which now mostly covers the parser tests. (Semantic analysis is still remaining)
One broad comment regarding the parser test macros - some of these are quite complex and I would feel more comfortable if we had a couple of tests to cover those ones. Adding some docs & comments would help as well for the macros with many cases
name: ResolvableIdentifier::Local(ident!($name)), | ||
access_type: AccessType::Default, | ||
offset: 0, | ||
ty: Some($ty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never be able to have a non-Felt type for a bounded symbol access (although maybe this is for testing invalid cases?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that can probably be hardcoded here. If it ever comes up that we need to specify arbitrary types for certain tests, we can handle that as an additional case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I know why this exists at the moment - for tests which do not run through semantic analysis, the AST nodes are untyped, so we need a case that is untyped, and one that is typed, even if that type is always the same. So I made bounded_access
mirror access
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. It might be clearer for the future to add a comment which says that
daa45ad
to
9cb3060
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review completes all of the test files. There are a bunch of tests with invalid AirScript that was previously acceptable at the parsing stage and addressed in the IR (or in some cases not addressed yet, but with issues open). I've marked everything that is invalid or nonsensical, since most validation is now moving to the semantic analysis pass.
It probably makes sense to adjust the tests to valid AirScript during this PR, but the checks for each of these things could be part of the next PR(s) if you think that makes more sense.
integrity_constraints: | ||
mod test | ||
|
||
ev test([clk]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can remove the comment above this that says "// the operation must be put into a source section, or parsing will fail", since it was explaining why the integrity_constraints
section was used, and that's no longer relevant. It can be removed everywhere in these test files.
ident!(test), | ||
vec![trace_segment!(0, "%0", [(clk, 1)])], | ||
vec![enforce!(eq!( | ||
exp!(access!(clk, 1), add!(access!(clk), int!(2))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should allow this anymore. I don't think there's any good reason for allowing non-constant exponents. It was previously allowed in the parser and blocked in the IR, but now that we're moving errors out of the IR and handling them during parsing & semantic analysis we should probably cover this case here.
integrity_constraints: | ||
enf clk = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it doesn't matter for the parser, but this is a useless integrity constraint. If we actually had something like this, we could just delete the entire column (since it would just be a column of 0s). It would be more realistic to do clk' = clk + 1
/// ``` | ||
/// | ||
/// This is used as a common base for most tests in this module | ||
fn test_module() -> Module { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we name this base_module
so it's more obvious that it's just forming the AST of the BASE_MODULE
? either that or maybe add a trivial test for equality between this and parsed BASE_MODULE
? I know it's just a test, but I always like to avoid having things that are expected to be equivalent but which don't either depend on each other or assert that equivalence
]); | ||
|
||
ParseTest::new().expect_ast(source, expected); | ||
enf x.first = 0 for x in inputs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't valid for a few reasons:
- Public inputs are just vectors of values, e.g.
[0, 1, 2, 3]
, so a notion ofinputs[0].first
is meaningless - Constraints must be enforced against trace columns, so
enf inputs[0] = 2
is semantically like trying to doenf 1 = 2
The previous test was valid because it iterated over trace columns
SourceSpan::UNKNOWN, | ||
vec![ | ||
let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() => | ||
enforce!(eq!(bounded_access!(a, Boundary::First), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: x is undefined in this context, which was probably not intentional for this test
let_!(x = lc!(((c, expr!(slice!(c, 0..3)))) => access!(c)).into() => | ||
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this shouldn't be valid, because the range syntax should not be inclusive, so x only has length 3
SourceSpan::UNKNOWN, | ||
vec![ | ||
let_!(diff = lc!(((x, expr!(access!(c))), (y, expr!(access!(d)))) => sub!(access!(x), access!(y))).into() => | ||
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: x
is not defined in this context, which was probably not intended for this test. It was probably supposed to be diff
vec![ | ||
let_!(diff = lc!(((w, range!(0..3)), (x, expr!(access!(b))), (y, expr!(slice!(c, 0..3))), (z, expr!(slice!(d, 0..3)))) => | ||
sub!(sub!(add!(access!(w), access!(x)), access!(y)), access!(z))).into() => | ||
enforce!(eq!(access!(a), add!(add!(add!(access!(x[0]), access!(x[1])), access!(x[2])), access!(x[3]))))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: same issue with x
being undefined. This was probably supposed to be diff
const B = [[1, 1], [2, 2]] | ||
|
||
integrity_constraints: | ||
enf test_constraint(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be enf test_constraint([b])
here and in the comment below. It shouldn't be valid to call an evaluator without the brackets that indicate the trace segment(s)
This commit introduces the following changes to the AST: 1. Change `Expr::Vector` to contain `Expr` elements rather than just `ScalarExpr` elements. This is necessary in order to properly represent the semantics of vectors in AirScript 2. Introduce `Statement::EnforceIf` for use during translation to represent individual scalar constraints with runtime selectors. We do not use this variant until inlining is performed to retain the benefits of the previous representation using `EnforceAll` 3. Introduce `Statement::Expr` for use during translation to represent statements used in an expression context. Specifically, when inlining is performed on a let-bound expression, the inlined expression may require the use of let-bound variables, so we need a way to represent such statements during the inlining transformation.
Now that the `air-parser` crate handles the bulk of the semantic analysis and transformation passes, most of what is in the `air-ir` crate is no longer needed, or has been superceded in some way. This commit removes the unnecessary parts, and reorganizes what remains to simplify the structure of the crate. NOTE: This commit does not update the tests, or include the AST to IR lowering pass. That will follow in subsequent commits.
f552d16
to
23e98d1
Compare
@hackaugusto I've updated the The commit which modifies the MASM code generator is b1440c1, and the broken tests are:
I suspect we can just update the expected output to account for the changes to the IR lowering phase of the compiler, but don't want to do that without checking with you first. Let me know if you see any bugs. @bobbinth @grjte This branch is now rebased against the latest code in |
Yeah, it is because of the new optimizations, it seems to be constants propagation:
and there seems to be a case of #314 in there. Thank you for handling the rebase! |
Thanks @hackaugusto! You're right that this changeset includes a patch that orders declarations by name in the intermediate representation, which was done for determinism purposes, but seems to also solve #314, so we may be able to close that out when this is merged |
23e98d1
to
f08b2f8
Compare
NOTE: I've broken up this PR into a sequence of commits that are smaller/more focused, and in the order that will give you the best context as you review each of them. These commits do not compile individually, but all together they do, as well as pass all of the parser crate tests - they are simply broken up this way to make it easier to review. Please take note of the commit messages, they provide a lot of detail.
This PR refactors the parser crate to implement modules, while in the process simplifying the grammar and the syntax tree itself. It adds support for rich diagnostics to all AST nodes, and makes heavy use of the new diagnostics infrastructure during program compilation/validation. It implements a generic visitor for implementing passes over the AST, and uses this both for import resolution and semantic analysis, both of which are also implemented in this PR. As an aside, this PR also removes the few vestiges of
match enf
currently in the compiler, which we can add back once the discussion around its syntax and semantics has been pinned down.Now, this PR is a draft PR because after doing some initial review of how this affects upstream crates, namely the IR, I did not want to dive into updating the IR crate (which does not compile on this branch) until these changes have been signed off on, and we've discussed how to proceed. Many of the things which the IR crate does in terms of validation and constructing state for the constraint graph are actually made redundant by the changes in this PR, and I'm unsure how we want to approach that going forward.
To describe a bit what I mean: when you invoke the parser after these changes, you are asking the parser to produce a
Program
. In the process of producing that, the parser does all of the important work of semantic analysis, and in particular, resolving identifiers to their definitions, propagating types, etc. This means that when you get aProgram
, it is still a relatively high level representation, however all names have been resolved, most dead code has been eliminated, and it is ready for lowering to a form more amenable for optimization and execution (i.e. the IR). However, the IR (currently) is doing a lot of that work as it translates from the AST to the IR, which simply falls apart when modules are involved, and in general is not the ideal place in a compilation pipeline to do some of that stuff.What I'd like to do to get the IR crate compiling again is first figure out how close the
Program
structure is to "ideal" form for lowering to the IR, and what we need to translate directly from it to the IR. It may be the case that some of the information gathered (and subsequently discarded) during semantic analysis, such as the dependency graph, would actually be useful during lowering to the IR, and if so, we might want to defer invoking semantic analysis as part of the parser, and instead put it in the hands of the IR lowering pass. If on the other hand, all of the information needed is already in theProgram
structure, then I suspect we can vastly simplify parts of the IR crate now.